Uncovering Interesting Attributed Anomalies in Large Graphs
نویسندگان
چکیده
Uncovering Interesting Attributed Anomalies in Large Graphs Nan Li Graph is a fundamental model for capturing entities and their relations in a wide range of applications. Examples of real-world graphs include the Web, social networks, communication networks, intrusion networks, collaboration networks, and biological networks. In recent years, with the proliferation of rich information available for real-world graphs, vertices and edges are often associated with attributes that describe their characteristics and properties. This gives rise to a new type of graphs, namely attributed graphs. Anomaly detection has been extensively studied in many research areas, and finds important applications in real-world tasks such as financial fraud detection, spam detection and cyber security. Anomaly detection in large graphs, especially graphs annotated with attributes, is still under explored. Most of existing work in this aspect focuses on the structural information of the graphs. In this thesis, we aim to address the following questions: How do we define anomalies in large graphs annotated with attributive information? How to mine such anomalies efficiently and effectively? A succinct yet fundamental anomaly definition is introduced: given a graph augmented with vertex attributes, an attributed anomaly refers to a constituent xi component of the graph, be it a vertex, an edge, or a subgraph, exhibiting abnormal features that deviate from the majority of constituent components of the same nature, in a combined structural and attributive space. For example in a social network, assume there exists a group of people, most of whom share similar taste in movies, whereas the majority of social groups in this network tend to have very diverse interests in movies; or in a collaboration network, there exists a group of closely connected experts that possess a set of required expertise, and such a group occurs scarcely in this network; we consider the groups in both scenarios as “anomalous”. Applications of this research topic abound, including target marketing, recommendation systems, and social influence analysis. The goal of this work therefore is to create efficient solutions to effectively uncover interesting anomalous patterns in large attributed graphs. In service of this goal, we have developed several frameworks using two types of approaches: (1) combinatorial methods based on graph indexing and querying; (2) statistical methods based on probabilistic models and network regularization.
منابع مشابه
A Probabilistic Approach to Uncovering Attributed Graph Anomalies
Uncovering subgraphs with an abnormal distribution of attributes reveals much insight into network behaviors. For example in social or communication networks, diseases or intrusions usually do not propagate uniformly, which makes it critical to find anomalous regions with high concentrations of a specific disease or intrusion. In this paper, we introduce a probabilistic model to identify anomal...
متن کاملk-core decomposition: a tool for the analysis of large scale Internet graphs
We use the k-core decomposition, based on a recursive pruning of the least connected vertices, to study large scale Internet graphs at the Autonomous System level. This approach allows the characterization of progressively central cores of networks, conveniently uncovering hierarchical and structural properties. Internet maps show the noticeable property of having all k-cores consisting of a si...
متن کاملAnomaly detection in data represented as graphs
An important area of data mining is anomaly detection, particularly for fraud. However, little work has been done in terms of detecting anomalies in data that is represented as a graph. In this paper we present graph-based approaches to uncovering anomalies in domains where the anomalies consist of unexpected entity/relationship alterations that closely resemble non-anomalous behavior. We have ...
متن کاملCentralities in Large Networks: Algorithms and Observations
Node centrality measures are important in a large number of graph applications, from search and ranking to social and biological network analysis. In this paper we study node centrality for very large graphs, up to billions of nodes and edges. Various definitions for centrality have been proposed, ranging from very simple (e.g., node degree) to more elaborate. However, measuring centrality in b...
متن کاملC-Explorer: Browsing Communities in Large Graphs
Community retrieval (CR) algorithms, which enable the extraction of subgraphs from large social networks (e.g., Facebook and Twitter), have attracted tremendous interest. Various CR solutions, such as k-core and CODICIL, have been proposed to obtain graphs whose vertices are closely related. In this paper, we propose the C-Explorer system to assist users in extracting, visualizing, and analyzin...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2013